Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
نویسندگان
چکیده
Pairwise sequence comparison methods have been assessed using proteins whose relationships are known reliably from their structures and functions, as described in the SCOP database [Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia C. (1995) J. Mol. Biol. 247, 536-540]. The evaluation tested the programs BLAST [Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403-410], WU-BLAST2 [Altschul, S. F. & Gish, W. (1996) Methods Enzymol. 266, 460-480], FASTA [Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444-2448], and SSEARCH [Smith, T. F. & Waterman, M. S. (1981) J. Mol. Biol. 147, 195-197] and their scoring schemes. The error rate of all algorithms is greatly reduced by using statistical scores to evaluate matches rather than percentage identity or raw scores. The E-value statistical scores of SSEARCH and FASTA are reliable: the number of false positives found in our tests agrees well with the scores reported. However, the P-values reported by BLAST and WU-BLAST2 exaggerate significance by orders of magnitude. SSEARCH, FASTA ktup = 1, and WU-BLAST2 perform best, and they are capable of detecting almost all relationships between proteins whose sequence identities are >30%. For more distantly related proteins, they do much less well; only one-half of the relationships between proteins with 20-30% identity are found. Because many homologs have low sequence similarity, most distant relationships cannot be detected by any pairwise comparison method; however, those which are identified may be used with confidence.
منابع مشابه
Comparison of Sequence Similarity Measures for Distant Evolutionary Relationships
Sequence similarity algorithms are used to reconstruct increasing large evolutionary trees involving increasingly distant evolutionary relationships. This paper proposes two sequence similarity algorithms, called the Greedy Tiling and the Random Tiling algorithms, that are both based on the idea of tiling one sequence by parts of another sequence. Experimental comparisons show that the new algo...
متن کاملDetecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments.
Here, a scalable, accurate, reliable, and robust protein functional site comparison algorithm is presented. The key components of the algorithm consist of a reduced representation of the protein structure and a sequence order-independent profile-profile alignment (SOIPPA). We show that SOIPPA is able to detect distant evolutionary relationships in cases where both a global sequence and structur...
متن کاملProtein structure similarities.
Comparison of protein structures can reveal distant evolutionary relationships that would not be detected by sequence information alone. This helps to infer functional properties. In recent years, many methods for pairwise protein structure alignment have been proposed and are now available on the World Wide Web. Although these methods have made it possible to compare all available protein stru...
متن کاملComparative modeling in CASP5: progress is evident, but alignment errors remain a significant hindrance.
Models for 20 comparative modeling targets were submitted for the fifth round of the "blind" test of protein structure prediction methods (CASP5; http://predictioncenter.llnl.gov/casp5). The modeling approach used in CASP5 was similar to that used 2 years ago in CASP4 (Venclovas, Proteins 2001; Suppl 5:47-54). The main features of this approach include use of multiple templates, initial assessm...
متن کاملPASS2: A Database of Structure-Based Sequence Alignments of Protein Structural Domain Superfamilies
Sequence alignments guided by structural features are particularly suited for distant relationships and they permit a better sampling of the protein sequence space. Reliable sequence alignments could be useful in evolutionary biology and in defining structurefunction relationships for protein superfamilies. PASS2 database presents structure-based alignments of protein domains related at the sup...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 95 11 شماره
صفحات -
تاریخ انتشار 1998